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Late Pliocene and Early Pleistocene epochs 3.6 to 0.8 million years ago’ had climates 
resembling those forecasted under future warming’. Palaeoclimatic records show 
strong polar amplification with mean annual temperatures of 11-19 °C above 
contemporary values**. The biological communities inhabiting the Arctic during 

this time remain poorly known because fossils are rare’. Here we report an ancient 
environmental DNA‘ (eDNA) record describing the rich plant and animal assemblages 
of the Kap København Formation in North Greenland, dated to around two million 
years ago. The record shows an open boreal forest ecosystem with mixed vegetation 
of poplar, birch and thuja trees, as well as a variety of Arctic and boreal shrubs and 
herbs, many of which had not previously been detected at the site from macrofossil 
and pollen records. The DNA record confirms the presence of hare and mitochondrial 
DNA from animals including mastodons, reindeer, rodents and geese, all ancestral 

to their present-day and late Pleistocene relatives. The presence of marine species 
including horseshoe crab and green algae support a warmer climate than today. The 
reconstructed ecosystem has no modern analogue. The survival of such ancient DNA 
probably relates to its binding to mineral surfaces. Our findings open newareas of 
genetic research, demonstrating that it is possible to track the ecology and evolution 
of biological communities from two million years ago using ancient eDNA. 


The Kap København Formationis located in Peary Land, North Greenland 
(82° 24’ N 22°12’ W) in what is now a polar desert. The upper depositional 
sequence contains well-preserved terrestrial animal and plant remains 
washed into an estuary during a warmer Early Pleistocene interglacial 
cycle’ (Fig. 1). Nearly 40 years of palaeoenvironmental and climate 
research at the site provide a unique perspective into a period when 


the site was situated at the boreal Arctic ecotone with reconstructed 
summer and winter average minimum temperatures of 10 °C and -17 °C 
respectively—more than 10 °C warmer than the present’. These 
conditions must have driven substantial ablation of the Greenland 
Ice Sheet, possibly producing one of the last ice-free intervals’ in the 
last 2.4 million years (Myr). Although the Kap København Formation is 
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Fig. 1| Geographical location and depositional sequence. a. Location of Kap 
København Formation in North Greenland at the entrance to the Independence 
Fjord (82° 24’ N 22°12’ W) and locations of other Arctic Plio-Pleistocene 
fossil-bearing sites (red dots). b, Spatial distribution of the erosional remnants 
of the 100-m thick succession of shallow marine near-shore sediments between 
Mudderbugt and the low mountains towards the north (a+ b refers to location 


known to yield well-preserved macrofossils from a coniferous boreal 
forest and arich insect fauna, few traces of vertebrates have been found. 
To date, these comprise remains from lagomorph genera, their copro- 
lites and Aphodius beetles, which live in and on mammalian dung”. 
However, the approximately 3.4 Myr old Fyles Leaf bed and Beaver 
Pond on Ellesmere Island in Arctic Canada preserve fossils of mammals 
that potentially could have colonized Greenland, such as the extinct 
bear (Protarctos abstrusus), extinct beavers (Dipoides sp.), the small 
canine Eucyon and Arctic giant camelines*”” (similar to Paracamelus). 
Whether the Nares Strait was a sufficient barrier to isolate northern 
Greenland from colonization by this fauna remains an open question. 

The Kap København Formation is formally subdivided into two mem- 
bers’ (Fig. 1). The lower Member A consists of up to 50 m of laminated 
mud with an Arctic ostracod, foraminifera and mollusc fauna deposited 
in an offshore glaciomarine environment”. The overlying Member 
B consists of 40-50 m of sandy (units B1 and B3) and silty (unit B2) 
deposits (Extended Data Fig. 1), including thin organic-rich beds with 
an interglacial macrofossil fauna that were deposited closer to the 
shore ina shallow marine or estuarine environment represented by 
upper and lower shoreface sedimentary facies’. 

The specific depositional environments are also reflected in the min- 
eralogy of the units, where the proximal B3 locality has the lowest clay 
and highest quartz contents (Sample compositions in Supplementary 
Tables 4.2.1 and 4.2.2 and unit averages in Supplementary Tables 4.2.3 
and 4.2.4). The architecture of the basin infill suggests that Member B 
units thicken towards the present coast—thatis, distal to the sediment 
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74a and 74b). c, Glacial-interglacial division of the depositional succession of 
clay Member A and units B1, B2 and B3 constituting sandy Member B. Sampling 
intervals for all sites are projected onto the sedimentary succession of locality 
50. Sedimentological log modified after ref. ’. Circled numbers on the map 
mark sample sites for environmental DNA analyses, absolute burial dating and 
palaeomagnetism. Numbered sites refer to previous publications”? 


source in the low mountains in the north (Fig. 1). Abundant organic 
detritus horizons are recorded in units B1 and B3, which also contain 
beds rich in Arctic and boreal plant and invertebrate macrofossils, as 
well as terrestrial mosses™””. Therefore, the taphonomy of the DNA 
most probably reflects the biological communities eroded from arange 
of habitats, fluvially transported to the foreshore and concentrated as 
organic detritus mixed into sandy near-shore sediments within units 
Bland B3. Conversely, the deeper water facies from Member A and 
unit B2 have a stronger marine signal. This scenario is supported by 
the similarities in the mineralogic composition between Kap Køben- 
havn Formation sediments and Kim Fjelde sediments (Supplementary 
Tables 4.2.1 and 4.2.5). 


Geological age 

Aseries of complementary studies has successively narrowed the depo- 
sitional age bracket of the Kap København Formation from 4.0-0.7 Myr 
toa20,000-year-long age bracket around 2.4 Myr (see Supplementary 
Information, sections 1-3). This was achieved by a combination of pal- 
aeomagnetism, biostratigraphy and allostratigraphy’’*®. Notably, 
the last appearance data of the mammals, foraminifera and molluscs in 
the stratigraphic record showan age close to 2.4 Myr (see Supplemen- 
tary Information, section 2). Within this overall framework, we add new 
palaeomagnetic data showing that Member A has reversed magnetic 
polarity and the main part of the overlying unit B2 has normal magnetic 
polarity. In the context of previous work, this is consistent with three 
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Fig. 2 | Age proxies for the Kap København Formation. a, Revised 
palaeomagnetic analysis shows unit B2 to have normal polarity and unlocks 
three possible age scenarios (S1-S3) including Members A (blue) and B (brown). 
Normal polarity is coloured black and reverse polarity is shown in white. 

Ja, Jaramillo; Co, Cobb Mountain; Ol, Olduvai; Fe, Feni; Ka, Kaena; Ma, 
Mammoth. b, Presence and last appearance datum (LAD) for marine 
foraminifera Cibicides grossus, rabbit-genus Hypolagus and the mollusc Arctica 
islandica inthe High Arctic, Northern Hemisphere and North Greenland, 
respectively. The blue band onthe far right indicates the age range for Member 
A estimated from amino acid ratios on shells’. c, Convolved probability 
distribution functions for cosmogenic burial ages calculated for two different 


magnetostratigraphic intervals in the Early Pleistocene where there 
is a reversal: 1.93 Myr (scenario 1), 2.14 Myr (scenario 2) or 2.58 Myr 
(scenario 3) (Supplementary Information, section 1). Furthermore, we 
constrain the age using cosmogenic %Al:®Be burial dating of Member B 
at four sites in this study (Supplementary Information, section 3). The 
recommended maximum burial age for the Kap København Formation 
is 2.70 + 0.46 Myr (Fig. 2; Methods). However, we discard the older 
scenario 3 as it contradicts the evidence for a continuous sedimen- 
tation across Members A and B during a single glacial-interglacial 
depositional cycle”!*118. This leaves two possible scenarios (scenarios 
land 2), in which scenario 1 supports an age of 1.9 Myr and scenario 2 
supports an age of 2.1 Myr. 


DNA preservation 

DNA degrades with time owing to microbial enzymatic activity, 
mechanical shearing and spontaneous chemical reactions such as 
hydrolysis and oxidation”. The oldest known DNA obtained to date 


production ratios (7.42 (black) and 6.75 (blue)). The dashed line and the solid 
line show the distributions for steady erosion and zero erosion, respectively. 
These distributions are all maximum ages. d, Molecular dating of Betula sp. 
yielding a median age of the DNA in the sediment of 1.323 Myr, with whiskers 
confining the 95% height posterior density (HPD) of 0.68 to 2.02 Myr (blue 
density plot), running Markov chain Monte Carlo estimation for 100 million 
iterations. The red dot is the median molecular age estimate found using the 
Mastodon mitochondrial genome restricting to radiocarbon-dated specimens, 
whereas the green area includes molecular clock estimated specimens in 
BEAST, running Markov chain Monte Carlo estimation for 400 million 
iterations. Whiskers confine the 95% HPD. 


has been recovered from a permafrost-preserved mammoth molar 
dated to 1.2-1.1 Myr using geological methods and 1.7 Myr (95% high- 
est posterior density, 2.1-1.3 Myr) using molecular clock dating”. 
To explore the likelihood of recovering DNA from sediments at the Kap 
København formation, we calculated the thermal age of the DNA and 
its expected degree of depurination at the Kap København Formation. 
Using the mean average temperature” (MAT) of -17 °C, we found a 
thermal age of 2.7 thousand years for DNA at a constant 10 °C, which 
is 741 times less than the age of 2.0 Myr (Supplementary Information, 
section 4 and Supplementary Table 4.4.1). Using the rate of depurina- 
tion from Moa bird fossils”, we found it plausible that DNA with an 
average size of 50 base pairs (bp) could survive at the Kap København 
Formation, assuming that the site remained frozen (Supplementary 
Information, section 4 and Supplementary Table 4.4.2). Mechanisms 
that preserve DNA in sediments are likely to be different from that of 
bone. Adsorption at mineral surfaces modifies the DNA conforma- 
tion, probably impeding molecular recognition by enzymes, which 
effectively hinders enzymatic degradation” ”. To investigate whether 
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Fig. 3 | Early Pleistocene plants of northern Greenland. Taxonomic profiles 
ofthe plant assemblage found in the metagenomes. Taxa in bold are genera 
only found as DNA and not as macrofossil or pollen. Asterisks indicate those 
that are found at other Pliocene Arctic sites. Extinct species as identified by 


the minerals found in Kap København Formation could have retained 
DNA during the deposition and preserved it, we determined the min- 
eralogic composition of the sediments using X-ray diffraction and 
measured their adsorption capacities. Our findings highlight that 
the marine depositional environment favours adsorption of extra- 
cellular DNA on the mineral surfaces (Supplementary Information, 
section 4 and Supplementary Table 4.3.1.1). Specifically, the clay miner- 
als (9.6-5.5 wt%) and particularly smectite (1.2-3.7 wt%), have higher 
adsorption capacity compared to the non-clay minerals (59-75 wt%). 
At a DNA concentration representative of the natural environments” 
(4.9 ng mI" DNA), the DNA adsorption capacity of smectite is 200 times 
greater than for quartz. We applied a sedimentary eDNA extraction 
protocol” on our mineral-adsorbed DNA samples, and retrieved only 
5% of the adsorbed DNA from smectite and around 10% from the other 
clay minerals (Methods and Supplementary Information, section 4). 
By contrast, we retrieved around 40% of the DNA adsorbed to quartz. 
The difference in adsorption capacity and extraction yield from the 
different minerals demonstrates that mineral composition may have 
animportant role in ancient eDNA preservation and retrieval. 


Kap København metagenomes 


We extracted DNA” from 41 organic-rich sediment samples at five 
different sites within the Kap København Formation (Supplementary 
Information, section 6 and Source Data 1), which were converted into 
65 dual-indexed Illumina sequencing libraries”. First, we tested 34 of 
the 65 libraries for plant plastid DNA by screening for the conserved 
photosystem II D2 (psbD) gene using droplet digital PCR (ddPCR) witha 
gene-targeting primer and probe spanning a39-bp region and a P7 index 
primer. Further, we screened for the psbA gene using a similar assay 
targeting the Poaceae (Methods and Supplementary Fig. 6.12.1). Aclear 
signal in 31 out of 34 samples tested confirmed the presence of plant 
plastid DNA in these libraries (Source Data 1, sheets 5 and 6). Addition- 
ally, we subjected 34 of the 65 libraries to mammalian mtDNA capture 
enrichment using the Arctic PaleoChip 1.07 and shotgun sequenced 
all libraries (initial and captured) using the Illumina HiSeq 4000 and 
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Rosaceae that are not present as a reference genome. 


NovaSeq 6000. A total of 16,882,114,068 reads were sequenced, which 
after adaptor trimming, filtering for >30 bp anda minimum phred qual- 
ity of 30 and duplicate removal resulted in 2,873,998,429 reads. These 
were analysed for k-mer comparisons using simka” (Supplementary 
Information, section 6) and then parsed for taxonomic classification 
using competitive mapping with HOLI (https://github.com/miwipe/ 
KapCopenhagen.git), which includes a recently published dataset of 
more than 1,500 genome skims of Arctic and boreal plant taxa? (Meth- 
ods and Supplementary Information, section 6). Considering the age of 
the samples and thus the potential genetic distance to recent reference 
genomes, we allowed each read to havea similarity between 95-100% 
for it to be taxonomically classified using ngsLCA”. The metaDMG 
(v.0.14.0) program” was subsequently used to quantify and filter each 
taxonomic node for postmortem DNA damage for all the metagenomic 
samples (Methods). This method estimates the average damage at 
the termini position (D-max) and a likelihood ratio (A-LR) that quanti- 
fies how much better the damage model (that is, more damage at the 
beginning of the read) fits the data compared with a null model (that 
is,aconstant amount of damage; see Supplementary Information, sec- 
tion 6). We found the DNA damage to be highly increased, especially 
for eukaryotes (mean D-max = 40.7%, see Supplementary Information, 
section 6). From this we set D-max 225% as a filtering threshold for a 
taxonomic node to be parsed for further downstream analysis as well 
as aA-LR higher or equal to 1.5. We furthermore set a threshold requir- 
ing that the minimum number of reads per taxon exceeded the median 
of reads assigned across all taxa divided by two to filter for taxa in low 
abundance. Similarly, forasample to be considered, the total number 
of reads for asample had to exceed the median number of reads per 
sample divided by two, to filter for samples with fewest reads. Lastly, 
we filtered out taxa with fewer than three replicates and subsequently 
reads were normalized by conversion to proportions (Figs. 3 and 4a). 


DNA, pollen and macrofossils comparison 


Greenland’s coasts extend from around 60° to 83° Nand include biocli- 
matic zones from the subarctic to the northern polar desert®”**. There 
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Fig. 4 | Early Pleistocene animals of northern Greenland. a, Taxonomic 
profiles ofthe animal assemblage from units B1, B2 and B3. Taxa in bold are 
genera only found as DNA. b, Phylogenetic placement and pathPhynder®? 


are 175 vascular plant genera native to Greenland, excluding histori- 
cally introduced species” *. Of these, 70 (40%) were detected by the 
metagenomic analysis (Fig. 3); the majority of these genera are today 
confined to bioclimatic zones well to the south of Kap Kébenhavn’s 
polar desert (see ref. ° and references therein), for example, all aquatic 
macrophytes. Reads assigned to Salix, Dryas, Vaccinium, Betula, Carex 
and Fquisetum dominate the assemblage, and of these genera, Equise- 
tum, Dryas, Salix arctica and two species of Carex (Carex nardina and 
Carex stans) grow there currently, whereas only afew records of Vaccin- 
ium uliginosum are found above 80°N, and Betula nana are found above 
74° N (ref. “). Out of the 102 genera detected in the Kap København 
ancient eDNA assemblage, 39% no longer grow in Greenland but do 
occur in the North American boreal (for example, Picea and Populus) 
and northern deciduous and maritime forests (for example, Crataegus, 
Taxus, Thuja and Filipendula). Many of the plant genera in this diverse 
assemblage do not occur on permafrost substrates and require higher 
temperatures than those at any latitude on Greenland today. 

In addition to the DNA, we counted pollen in six samples from locality 
119, unit B3 (Methods and Supplementary Fig. 5.1.1). Percentages were 
calculated for 4 of the samples with pollen sums ranging from 71-225 
terrestrial grains (mean = 170.25). Upland herbs, including taxa in the 
Cyperaceae, Ericales and Rosaceae comprised around 40% of sample 4. 
Samples 5 and 6 were dominated by arboreal taxa, particularly Betula. 
The Polypodiopsida (for example, Equisetum, Asplenium and Athyrium 

filix-mas) and Lycopodiopsida (Lycopodium annotinum and Selaginella 
rupestris) were also well represented and comprised over 30% of the 
assemblage in samples 1, 4 and 6. 

A total of 39 plant genera out of the 102 identified by DNA also 

occurred as macrofossils or pollen at the genus level. A further 39 
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results of mitochondrial reads uniquely classified to Elephantidae or lower 
(Source Data 1). Extinct species as identified by either macrofossils or 
phylogenetic placements are marked with a dagger. 


taxa were potentially identified as macrofossil or pollen but not to the 
same taxonomic level” (Source Data 1, sheets 1 and 2). For example, 
12 genera of Poaceae were identified by DNA (Alopecurus, Anthox- 
anthum, Arctagrostis, Arctophila, Calamagrostis, Cinna, Dupontia, 
Hordelymus, Leymus, Milium, Phippsia and Poa), of these only Horde- 
lymusis not found in the Arctic today (http://panarcticflora.org/), but 
these were only distinguished to family level in the pollen analysis and 
only one Poaceae macrofossil was found. There were 24 taxa that were 
recorded only as DNA. These included the boreal tree Populus and a few 
shrubs and dwarf shrubs, but mainly herbaceous plants. Of the 73 plant 
genera recovered as macrofossils™®™", only 24 were not detected in the 
DNA analysis. Because macrofossils and DNA have similar taphono- 
mies—as both are deposited locally—more overlap is expected between 
them than between DNA and pollen, whichis typically dispersed region- 
ally“. Nine of the taxa absent in DNA were bryophytes, probably owing 
to poor representation of this group within the genomic reference 
databases. Furthermore, the extinct taxon Araceae is not present inthe 
reference databases. The remaining undetected genera were vascular 
plants, and all except two (Oxyria and Cornus) were rare in the macro- 
fossil record. Because the detection of rare taxa is challenging in both 
macrofossil and DNA records*, we argue that this overlap between the 
DNA and macrofossil records is as high as can be expected on the basis 
of the limitations of both methods. 

An additional 19 taxa were recorded in the pollen record presented 
here and in that of Bennike* including four trees or shrubs, five ferns, 
three club mosses, and one each of algae, fungi and liverwort. We also 
find pollen from anemophilous trees, particularly gymnosperms, which 
can be distributed far north of the region where the plants actually 
grow”. Bennike* also notes a high proportion of club mosses and ferns 
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and suggests they may be overrepresented owing to their spore wall 
being resistant to degradation. Furthermore, if these taxa were pref- 
erentially distributed along streams flowing into the estuary, their 
spores could be relatively more concentrated in the alluvium than the 
pollen of more generally distributed taxa. Thus, both decay resistance 
and alluvial deposition could contribute to the relative frequencies we 
observe. This same alluvial dynamic might also have contributed to the 
very large read counts for Salix, Betula, Populus, Carex and Equisetumin 
the metagenomic record, implying that neither the proportion of these 
taxa in the pollen records nor read counts necessarily correlate with 
their actual abundance in the regional vegetation in terms of biomass 
or coverage. 

Finally, we sought to date the age of the plant DNA by phylogenetic 
placement of the chloroplast DNA. We examined data for the genera 
Betula, Populus and Salix, because these had both sufficiently high chlo- 
roplast genome coverage (with mean depth 24.16, 57.06 and 27.04x, 
respectively) and sufficient present-day whole chloroplast reference 
sequences (Methods). Owing to their age and hence potential genetic 
distance from the modern reference genomes, we lowered the similar- 
ity threshold of uniquely classified reads to 90% and merged these by 
unit to increase coverage. Both Betula and Salix placed basally to most 
of the represented species in the respective genera, and the Populus 
placement results showed support for a mixture of different species 
related to P. trichocarpa and P. balsamifera (Extended Data Figs. 7-9). 

We used the Betula chloroplast reads for a molecular dating analysis, 
because they were placed confidently on a single edge of the phylo- 
genetic tree (that is, not a mixture as in Populus), had a large number 
of reference sequences, and had high coverage in the ancient sample. 
We used BEAST” v1.10.4 to obtain a molecular clock date estimate for 
our ancient Betula chloroplast sample (see Methods, ‘Molecular dat- 
ing methods for details). We included 31 modern Betula and one Alnus 
chloroplast reference sequences, used only sites that had a depth of 
at least 20 in the ancient sample, and included a previously estimated 
Betula-Alnus chloroplast divergence time“ of 61.1 Myr for calibration of 
the root node. Our BEAST analysis was robust to both different priors on 
the age of the ancient sample, and to different nucleotide substitution 
models (Extended Data Fig. 10). This yielded a median age estimate of 
1.323 Myr, with a 95% HPD of (0.6786, 2.0172) Myr (Fig. 2). 


Animal DNA results 


The metazoan mitochondrial and nuclear DNA record was much less 
diverse than that of the plants but contained one extinct family, one 
that is absent from Greenland today, and four vertebrate genera native 
to Greenland as well as representatives of four invertebrate families 
(Fig. 4a). Assignments were based on incomplete and variable repre- 
sentation of reference genomes, so we identified reads to family level, 
and only where sufficient mitochondrial reads were present, we refined 
the assignment to genus level by matching these into mitochondrial 
phylogenies based on more complete present-day mitochondrial 
sequences (Supplementary Information, section 6). As for the plant 
reads, uniquely classified animal reads with more than 90% similarity 
were parsed and merged by unit to increase coverage for phylogenetic 
placement. 

Most notably, we found reads in unit B2 and B3 assigned to the family 
Elephantidae, which includes elephants and mammoths, but taxo- 
nomically not mastodon (Mammut sp.)—which are, however, inthe NCBI 
taxonomy, and therefore our analysis reads classified to Elephantidae 
or below therefore include Mammut sp. A consensus genome of our 
Elephantidae mitochondrial reads falls on the Mammut sp. branch 
(Fig. 4b) and is placed basal to all clades of mastodons. However, we note 
that this placement within the mastodons depends on only two transi- 
tion single-nucleotide polymorphisms (SNPs), with the first one sup- 
ported by a read depth of three and the second by only one (Extended 
Data Fig. 4, Methods and Supplementary Information, section 6). 
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Furthermore, we attempted dating the recovered mastodon mito- 
chondrial genome using BEAST*. We implemented two dating 
approaches, one was based on using radiocarbon-dated specimens 
alone, while the other used radiocarbon- and molecular-dated masto- 
dons. The first analysis yielded a median age estimate for our mastodon 
mitogenome of 1.2 Myr (95% HPD: 191,000 yr-3.27 Myr), the second 
approach resulted in a median age estimate of 5.2 Myr (95% HPD: 
1.64-10.1 Myr) (Supplementary Fig. 6.8.5 and Supplementary Informa- 
tion, section 6). 

Similarly, reads assigned to the Cervidae support a basal place- 
ment on the Rangifer (reindeer and caribou) branch (Extended Data 
Fig. 3). Mitochondrial reads mapping to Leporidae (hares and rabbits) 
place near the base to the Eurasian hare clade (Extended Data Fig. 2), 
which is the only mammal found in the fossil record’. Lepus, specifi- 
cally Lepus arcticus, is also the only genus in the Leporidae living in 
Greenland today. Mitochondrial reads assigned to Cricetidae cover 
only one informative transversion SNP, which places them as deriv- 
ing from the subfamily Arvicolinae (voles, lemmings and muskrats) 
(Extended Data Fig. 6). For the only avian taxon represented in our 
dataset—Anatidae, the family of geese and swans—we found a robust 
basal placement to the genus Branta of black geese, supported by three 
transversion SNPs with read depths ranging between two and four 
(Extended Data Fig. 5). The refined vertebrate assignments based on 
mitochondrial references are more biogeographically conserved than 
for plants. Dicrostonyx—specifically Dicrostonyx groenlandicus (the 
Nearctic collared lemming)—is the only genus of the Cricetidae native 
to Greenland today, just as Rangifer—specifically Rangifer tarandus 
groenlandicus (the barren-ground caribou)—is the only member of 
the Cervidae. The mastodon is the exception, as no member of the 
Elephantidae lives in present-day Greenland. 


Ancient DNA from marine organisms 


The other metazoan taxa identified in the DNA record were a single 
reef-building coral (Merulinidae) and several arthropods, with matches 
to two insects—Formicidae (ants) and Pulicidae (fleas)—and one marine 
family—Limulidae (horseshoe crabs). This is somewhat unexpected, 
given the richinsect macrofossil record from the Kap København For- 
mation, which comprises more than 200 species, including Formica 
sp. The marine taxa are less abundant than the terrestrial taxa, and 
no mitochondrial DNA was identified from marine metazoans. The 
read lengths, DNA damage and the fact that the reads assigned distrib- 
ute evenly across the reference genomes suggests that these are not 
artefacts but may be over-matched DNA sequences of closely related, 
potentially extinct species within the families that are currently absent 
from our reference databases owing to poor taxonomic representation. 
By contrast, Limulidae, in the subphylum Chelicerata, is unlikely to 
be misidentified as this distinct genus is the only surviving member 
within its order and thus deeply diverged from other extant organisms. 

The probable source of these reads is a population of Limulus 
polyphemus, the only Atlantic member of the genus, which would 
have spawned directly onto the sediment as it accumulated. Today 
this genus does not spawn north of the Bay of Fundy (about 45° N), 
suggesting warmer surface water conditions in the Early Pleistocene 
at Kap København consistent with the +8 °C annual sea surface tem- 
perature anomaly reconstructed for the Pleistocene of the coast of 
northeast Greenland™. By aligning our reads against the Tara Oceans 
eukaryotic metagenomic assembled genomes (SMAGs) data (Meth- 
ods), we further reveal the presence of 24 marine planktonic taxa in 
14 samples, covering both zooplankton and phytoplankton (Fig. 5). 
These detected SMAGs belong to the supergroups Opisthokonta (6), 
Stramenopila (15) and Archaeplastida (3). The majority of these signals 
are from SMAGs associated with cold regions in the modern ocean 
(that is, the Arctic Ocean and Southern Ocean), suchas diatoms (Bacil- 
lariophyta), Chrysophyceae and the MAST-4 group (Supplementary 
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Fig. 5| Marine planktonic eukaryotes identified at the Kap København 
Formation. a, Detection of SMAGs and average damage (D-max) of a SMAG 
within a member unit. Top, the SMAG distribution in contemporary oceans 
based on the data of Delmont et al.®. The SMAGs are ordered on the basis of 


Table 6.11.1), as we expected. However, a few are cosmopolitan, whereas 
others, such as Archaeplastida (green microalgae), have an oceanic 
signal that is today confined to more temperate waters in the Pacific 
Ocean (Fig. 5). Although we do not know whether modern day ecologies 
can be extrapolated to ancient ecosystems, the abundance of green 
microalgae is believed to be increasing in Arctic regions, which tends 
to be associated with warming surface waters. 


Discussion 


The Kap København ancient eDNA record is extraordinary for sev- 
eral reasons; the upper limit of the 95% highest posterior density of 
the estimated molecular age is 2.0 Myr and independently supports 
a geological age of approximately 2 Myr (Fig. 2). This implies that the 
DNA is considerably older than any previously sequenced DNA”. Our 
DNA results detected five times as many plant genera as previous stud- 
ies using shotgun sequencing of ancient sediments””**", which is 
well within the range of the richest northern boreal metabarcoding 
records”. The accuracy of the assignments is strengthened by the 
observation that 76% of the taxa identified to the level of genus or 
family also occurred in macrofossil and/or pollen assemblages from 
thesame units. Our results demonstrate the potential of ancient envi- 
ronmental metagenomics to reconstruct ancient environments, phy- 
logenetically place and date ancient lineages from diverse taxa from 
around 2 Ma (Supplementary Information, section 6). Finally, the DNA 
identified a set of additional plant genera, which occur as macrofossils 
at other Arctic Late Pliocene and Early Pleistocene sites (Figs. 1 and 3 
and Supplementary Information, section 5) but not as fossils at Kap 
København, thereby expanding the spatiotemporal distribution of 
these ancient floras. 


phylogenomic inference from Delmont et al.®. b-d, Distribution of DNA 
damage among the taxonomic supergroup Opisthokonta (b), Stramenopila 
(c) and Archaeplastida (d) (Source Data1). 


Of note, the detection of both Rangifer (reindeer and caribou) and 
Mammut (mastodon) forces a revision of earlier palaeoenvironmental 
reconstructions based on the site’s relatively impoverished faunal 
record, entailing both higher productivity and habitat diversity for 
much of the deposition period. Because all the vertebrate taxa identi- 
fied by DNA are herbivores, their representation may be a function 
of relative biomass (see discussion on taphonomy in Supplementary 
Information, section 6). Caribou, geese, hares and rodents can all be 
abundant, at least seasonally, in boreal environments. Additionally, 
the excrement of large herbivores (such as caribou and particularly 
mastodons) can bea significant component of sediments. By contrast, 
carnivores are not represented, consistent with their smaller total 
biomass. This dynamic also explains the dominance of plant reads 
over metazoans and to some extent differences in representation of 
various plant genera (Supplementary Information, section 6). In the 
general absence of fossils, DNA may prove the most effective tool for 
reconstructing the biogeography of vertebrates through the Early 
Pleistocene. DNA from mastodon must imply a viable population of this 
large browsing megaherbivore, which would require a more produc- 
tive boreal habitat than that inferred in earlier reconstructions based 
primarily on plant macrofossils’. Mastodon dung from asite in central 
Nova Scotia from around 75,000 years ago contained macrofossils from 
sedges, cattail, bulrush, bryophytes and even charophytes, but was 
dominated by spruce needles and birch samaras™. The Kap København 
units with mastodon DNA yielded macrofossils and DNA from Betula 
as well as more thermophilic arboreal taxa including Thuja, Taxus, 
Cornus and Viburnum, none of which range into Greenland’s hydric 
Arctic tundra or polar deserts today. The co-occurrence of these taxa 
in multiple units compels a revision of previous temperature estimates 
as well as the presence of permafrost. 


Nature | Vol 612 | 8 December 2022 | 289 


Article 


No single modern plant community or habitat includes the range 
of taxa represented in many of the macrofossil and DNA samples 
from Kap København. The community assemblage represents a 
mixture of modern boreal and Arctic taxa, which has no analogue 
in modern vegetation’”». To some degree, this is expected, as the 
ecological amplitudes of modern members of these genera have been 
modified by evolution”. Furthermore, the combination of the High 
Arctic photoperiod with warmer conditions and lower atmospheric 
CO, concentrations® made the Early Pleistocene climate of North 
Greenland very different from today. The mixed character of the ter- 
restrial assemblage is also reflected in the marine record, where Arctic 
and more cosmopolitan SMAGs of Opistokonta and Stramenopila 
are found together with horseshoe crabs, corals and green micro- 
algae (Archaeplastida), which today inhabit warmer waters at more 
southern latitudes. 

Megaherbivores, particularly mastodons, could have had a signifi- 
cant impact on an interglacial taiga environment, even providing a 
top-down trophic control on vegetation structure and composition 
at this high latitude. The presence of mastodons*”*’ coupled with the 
absence of anthropogenic fire, which has had a role in some Holocene 
boreal habitats”, are important differences. Another important fac- 
tor is the proximity and biotic richness of the refugia from which 
pioneer species were able to disperse into North Greenland when 
conditions became favourable at the beginning of interglacials. 
The shorter duration of Early Pleistocene glaciations produced less 
extensive ice sheets allowing colonization from relatively species-rich 
coniferous-deciduous woodlands in northeastern Canada”®°. More 
extensive glaciation later in the Pleistocene increasingly isolated North 
Greenland and later re-colonizations were from increasingly distant 
and/or less diverse refugia. 

In summary, we show the power of ancient eDNA to add substan- 
tial detail to our knowledge of this unique, ancient open boreal forest 
community intermixed with Arctic species, acommunity composition 
that has no modern analogues and included mastodons and reindeer, 
among others. Similar detailed flora and vertebrate DNA records may 
survive at other localities. If recovered, these would advance our under- 
standing of the variability of climate and biotic interactions during the 
warmer Early Pleistocene epochs across the High Arctic. 
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Methods 


Sampling 

Sediment samples were obtained from the Kap København Formationin 
North Greenland (82° 24’ 00” N 22°12’ 00” W) inthe summers of 2006, 
2012 and 2016 (see Supplementary Table 3.1.1). Sampled material con- 
sisted of organic-rich permafrost and dry permafrost. Prior to sampling, 
profiles were cleaned to expose fresh material. Samples were hereafter 
collected vertically from the slope of the hills either using a10 cm diam- 
eter diamond headed drill bit or cutting out ~40 x 40 x 40 cm blocks. 
Sediments were kept frozen in the field and during transportation to 
the lab facility in Copenhagen. Disposable gloves and scalpels were used 
and changed between each sample to avoid cross-contamination. Ina 
controlled laboratory environment, the cores and blocks were further 
sub-sampled for material taking only the inner part of sediment cores, 
leaving 1.5-2 cm between the inner core and the surface that provided 
asubsample of approximately 6-10 g. Subsequently, all samples were 
stored at temperatures below -22 °C. 

We sampled organic-rich sediment by taking samples and biological 
replicates across the three stratigraphic units B1, B2 and B3, spanning 5 
different sites, site: 50 (B3), 69 (B2), 74a (B1), 74b (B1) and 119 (B3). Each 
biological replicate from each unit at each site was further sampled in 
different sublayers (numbered LO-L4, Source Data 1, sheet 1). 


Absolute age dating 

In 2014, Be and Al oxide targets from 8x 1 kg quartz-rich sand samples 
collected at modern depths ranging from 3 to 21 m below stream cut 
terraces were analysed by accelerator mass spectrometry and the cos- 
mogenic isotope concentrations interpreted as maximum ages using 
a simple burial dating approach! (Al: Be versus normalized Be). 
The *°Al and Be isotopes were produced by cosmic ray interactions 
with exposed quartz in regolith and bedrock surfaces in the moun- 
tains above Kap København prior to deposition. We assume that the 
?6A]:!°Be was uniform and steady for long time periods in the upper 
few metres of these gradually eroding palaeo-surfaces. Once eroded 
by streams and hillslope processes, the quartz sand was deposited in 
sandy braided stream sediment, deltaic distributary systems, or the 
near-shore environment and remained effectively shielded from cosmic 
ray nucleons buried (many tens of metres) under sediment, intermit- 
tent ice shelf or ice sheet cover, and—at least during interglacials—the 
marine water column until final emergence. The simple burial dating 
approach assumes that the sand grains experienced only one burial 
event. If multiple burial events separated by periods of re-exposure 
occurred, then the starting *°Al:Be before the last burial event would 
be less than the initial production ratio (6.75 to 7.42, see discussion 
below) owing to the relatively faster decay of %Al during burial, and 
therefore the calculated burial age would be amaximum limiting age. 
Multiple burial events can be caused by shielding by thick glacier icein 
the source area, or by sediment storage in the catchment prior to final 
deposition. These shielding events mean that the *°Al:"°Be is lower, 
and therefore a calculated burial age assuming the initial production 
ratio would overestimate the final burial duration. We also consider 
that once buried, the sand grains may have been exposed to second- 
ary cosmogenic muons (their depth would be too great for subma- 
rine nucleonic production). As sedimentation rates in these glaciated 
near-shore environments are relatively rapid, we show that even the 
muonic production would be negligible (see Supplemental Informa- 
tion). However, once the marine sediments emerged above sea level, 
in-situ production by both nucleogenic and muogenic production 
could alter the “Al:"°Be. The “°Al versus Be isochron plot reveals this 
complex burial history (Supplementary Information, section 3) and 
the concentration versus depth composite profiles for both *%Al and 
Be reveal that the shallowest samples may have been exposed during 
a period of time (~15,000 years ago) that is consistent with deglacia- 
tion in the area (Supplemental Information). While we interpret the 


individual simple burial age of all samples as a maximum limiting age 
of deposition of the Kap København Formation Member B, we recom- 
mend using the three most deeply shielded samples ina single depth 
profile to minimize the effect of post-depositional production. We 
then calculate a convolved probability distribution age for these three 
samples (KKO6A, B and C). However, this calculation depends on the 
?6Al:°Be production ratio we use (that is, between 6.75 and 7.42) and 
on whether we adjust for erosion in the catchment. So, we repeat the 
convolved probability distribution function age for the lowest and 
highest production ratio and zero to maximum possible erosion rate, to 
obtain the minimum and maximum limiting age range at loconfidence 
(Supplementary Information, section 3). Taking the midpoint between 
the negative and positive 30 confidence limits, we obtain a maximum 
burial age of 2.70 + 0.46 Myr. This age is also supported by the position 
of those three samples on the isochron plot, which suggests the true 
age may not be significantly different that this maximum limiting age. 


Thermal age 

The extent of thermal degradation of the Kap København DNA was com- 
pared tothe DNA fromthe Krestovka Mammoth molar. Published kinetic 
parameters for DNA degradation™ were used to calculate the relative 
rate difference over a given interval of the long-term temperature record 
and to quantify the offset from the reference temperature of 10 °C, 
thus estimating the thermal age in years at 10 °C for each sample (Sup- 
plementary Information, section 4). The mean annual air temperature 
(MAT) for the the Kap København sediment was taken from Funder et al. 
(2001)° and for the Krestovka Mammoth the MAT was calculated using 
temperature data from the Cerskij Weather Station (WMO no. 251230) 
68.80° N 161.28° E, 32 m from the International Research Institute Data 
Library (https://iri.columbia.edu/) (Supplementary Table 4.4.1). 

We did not correct for seasonal fluctuation for the thermal age 
calculation of the Kap København sediments or from the Krestovka 
Mammoth. We do provide theoretical average fragment length for 
four different thermal scenarios for the DNA in the Kap København 
sediments (Supplementary Table 4.4.2). A correction in the thermal 
age calculation was applied for altitude using the environmental lapse 
rate (6.49 °C km). We scaled the long-term temperature model of 
Hansen et al. (2013)® to local estimates of current MATs by a scaling 
factor sufficient to account for the estimates of the local temperature 
decline at the last glacial maximum and then estimated the integrated 
rate using an activation energy (Ea) of 127 kJ mol” (ref. “). 


Mineralogic composition 

The minerals in each of the Kap København sediment samples were 
identified using X-ray diffraction and their proportions were quantified 
using Rietveld refinement. The samples were homogenized by grinding 
~1 g of sediment with ethanol for 10 min in a McCrone Mill. The samples 
were dried at 60 °C and added corundum (CR-1, Baikowski) as the inter- 
nal standard toa final concentration of 20.0 wt%. Diffractograms were 
collected using a Bruker D8 Advance (O-O geometry) and the LynxEye 
detector (opening 2.71°), with Cu K,, radiation (1.54 A; 40 kV, 40 mA) 
using a Ni-filter with thickness of 0.2 mm on the diffracted beam and 
a beam knife set at 3 mm. We scanned from 5-90° 20 with a step size 
of 0.1° anda step time of 4 s while the sample was spun at 20 rpm. The 
opening of the divergence slit was 0.3° and of the antiscatter slit 3°. 
Primary and secondary Soller slits had an opening of 2.5° and the open- 
ing of the detector window was 2.71°. For the Rietveld analysis, we used 
the Profex interface for the BGMN software”. The instrumental 
parameters and peak broadening were determined by the fundamental 
parameters ray-tracing procedure®. A detailed description of identi- 
fication of clay minerals can be found in the supporting information. 


Adsorption 
We used pure or purified minerals for adsorption studies. The minerals 
used and treatments for purifying them are listed in Supplementary 


Table 4.2.6. The purity of minerals was checked using X-ray diffraction 
with the same instrumental parameters and procedures as listed in the 
above section i.e., mineralogical composition. Notes on the origin, 
purification and impurities can be found in the Supplementary Infor- 
mation section 4. We used artificial seawater® and salmon sperm DNA 
(low molecular weight, lyophilized powder, Sigma Aldrich) as amodel 
for eDNA adsorption. A known amount of mineral powder was mixed 
with seawater and sonicated in an ultrasonic bath for 15 min. The DNA 
stock was then added to the suspension to reach a final concentra- 
tion between 20-800 ug mI. The suspensions were equilibrated ona 
rotary shaker for 4 h. The samples were then centrifuged and the DNA 
concentration in the supernatant determined with UV spectrometry 
(Biophotometer, Eppendorf), with both positive and negative controls. 
All measurements were done in triplicates, and we made five to eight 
DNA concentrations per mineral. We used Langmuir and Freundlich 
equations to fit the model to the experimental isotherm and to obtain 
adsorption capacity of a mineral at a given equilibrium concentration. 


Pollen 

The pollen samples were extracted using the modified Grischuk pro- 
tocol adopted in the Geological Institute of the Russian Academy of 
Science which utilizes sodium pyrophosphate and hydrofluoric acid”. 
Slides prepared from 6 samples were scanned at 400x magnification 
with a Motic BA 400 compound microscope and photographed usinga 
Moticam 2300 camera. Pollen percentages were calculated as a propor- 
tion of the total palynomorphs including the unidentified grains. Only 
4 of the 6 samples yielded terrestrial pollen counts 250. In these, the 
total palynomorphs identified ranged from 225 to 71 (mean = 170.25; 
median = 192.5). Identifications were made using several published 
keys”. The pollen diagram was initially compiled using Tilia version 
1.5.12” but replotted for this study using Psimpoll 4.10”. 


DNA recovery 

For recovery calculation, we saturated mineral surfaces with DNA. For 
this, we used the same protocol as for the determination of adsorption 
isotherms with an added step to remove DNA not adsorbed but only 
trapped in the interstitial pores of wet paste. This step was important 
because interstitial DNA would increase the amount of apparently 
adsorbed DNA and overestimate the recovery. To remove trapped DNA 
after adsorption, we redispersed the minerals in seawater. The process of 
redispersing the wet paste in seawater, ultracentrifugation and removal 
of supernatant lasted less than 2.5 min. After the second centrifugation, 
the wet pastes were kept frozen until extraction. We used the same extrac- 
tion protocol as for the Kap København sediments. After the extraction, 
the DNA concentration was again determined using UV spectrometry. 


Metagenomes 

A total of 41 samples were extracted for DNA” and converted to 65 
dual-indexed Illumina sequencing libraries (including 13 negative 
extraction- and library controls)*°. 34 libraries were thereafter subjected 
to ddPCR using a QX200 AutoDG Droplet Digital PCR System (Bio-Rad) 
following manufacturer’s protocol. Assays for ddPCR include a P7 
index primer (5’-AGCAGAAGACGGCATAC-3’) (900nM), gene-targeting 
primer (900 nM), and a gene-targeting probe (250nM). We screened 
for Viridiplantae psbD (primer: 5’-TCATAAT TGGACGT TGAACC-3’, 
probe: 5’-(FAM)ACTCCCATCATATGAAA(BHQ]1)-3’) and Poaceae 
psbA (primer: 5’-CTCACAACTTCCCTCTAGAC-3’, probe 5’-(HEX) 
AGCTGCTGTTGAAGTTC(BHQ]1)-3’). Additionally, 34 of the 65 librar- 
ies were enriched using targeted capture enrichment, for mammalian 
mitochondrial DNA using the PaleoChip Arcticl.0 bait-set™ and all librar- 
ies were hereafter sequenced on an Illumina HiSeq 4000 80 bp PEora 
NovaSeq 6000 100 bp PE. We sequenced a total of 16,882,114,068 reads 
which, after low complexity filtering (Dust = 1), quality trimming (q > 25), 
duplicate removal and filtering for reads longer than 29 bp (only paired 
read mates for NovaSeq data) resulted in 2,873,998,429 reads that were 


parsed for further downstream analysis. We next estimated kmer simi- 
larity between all samples using simka” (setting heuristic count for max 
number of reads (-max-reads 0) and a kmer size of 31 (-kmer-size 31)), 
and performed a principal component analysis (PCA) on the obtained 
distance matrix (see Supplementary Information, ‘DNA’). We hereaf- 
ter parsed all QC reads through HOLI” for taxonomic assignment. To 
increase resolution and sensitivity of our taxonomic assignment, we 
supplemented the RefSeq (92 excluding bacteria) and the nucleotide 
database (NCBI) witha recently published Arctic-boreal plant database 
(PhyloNorway) and Arctic animal database* as well as searched the NCBI 
SRA for 139 genomes of boreal animal taxa (March 2020) of which 16 
partial-full genomes were found and added (Source Data1, sheet 4) and 
used the GTDB microbial database version 95 as decoy. All alignments 
were hereafter merged using samtools and sorted using gz-sort (v. 1). 
Cytosine deamination frequencies were then estimated using the newly 
developed metaDMG, by first finding the lowest common ancestor 
across all possible alignments for each read and then calculating dam- 
age patterns for each taxonomic level** (Supplementary Information, 
section 6). In parallel, we computed the mean read length as well as 
number of reads per taxonomic node (Supplementary Information, 
section 6). Our analysis of the DNA damage across all taxonomic levels 
pointed toa minimum filter for all samples at all taxonomic levels with 
a D-max > 25% and a likelihood ratio (A-LR) > 1.5. This ensured that only 
taxa showing ancient DNA characteristics were parsed for downstream 
profiling and analysis and resulted in no taxa within any controls being 
found (Supplementary Information, section 6). 


Marine eukaryotic metagenome 
We sought to identify marine eukaryotes by first taxonomically 
labelling all quality-controlled reads as Eukaryota, Archaea, Bacte- 
ria or Virus using Kraken 2” with the parameters ‘--confidence 0.5 
--minimum-hit-groups 3’ combined with an extra filtering step that 
only kept those reads with root-to-leaf score >0.25. For the initial Kraken 
2 search, we used a coarse database created by the taxdb-integration 
workflow (https://github.com/aMG-tk/taxdb-integration) covering 
all domains of life and including a genomic database of marine plank- 
tonic eukaryotes” that contain 683 metagenome-assembled genomes 
(MAGs) and 30 single-cell genomes (SAGs) from Tara Oceans”, follow- 
ing the naming convention in Delmont et al.®, we will refer to them 
as SMAGs. Reads labelled as root, unclassified, archaea, bacteria and 
virus were refined through a second Kraken 2 labelling step using a 
high-resolution database containing archaea, bacteria and virus cre- 
ated by the taxdb-integration workflow. We used the same Kraken 2 
parameters and filtering thresholds as the initial search. Both Kraken 
2 databases were built with parameters optimized for the study read 
length (--kmer-len 25 --minimizer-len 23 --minimizer-spaces 4). 
Reads labelled as eukaryota, root and unclassified were hereafter 
mapped with Bowtie2” against the SMAGs. We used MarkDuplicates 
from Picard (https://github.com/broadinstitute/picard) to remove 
duplicates and then we calculated the mapping statistics for each 
SMAG inthe BAM files with the filterBAM program (https://github.com/ 
aMG-tk/bam-filter). We furthermore estimated the postmortem damage 
of the filtered BAM files with the Bayesian methods in metaDMG and 
selected those SMAGs with a D-max = 0.25 and a fit quality (A-LR) higher 
than 1.5. The SMAGs with fewer than 500 reads mapped, a mean read 
average nucleotide identity (ANI) of less than than 93% and a breadth of 
coverage ratio and coverage evenness of less than 0.75 were removed. We 
followed a data-driven approach to select the mean read ANI threshold, 
where we explored the variation of mapped reads as a function of the 
mean read ANI values from 90% to 100% and identified the elbow pointin 
the curve (Supplementary Fig. 6.11.1). We used anvi’o” in manual mode 
to plot the mapping and damage results using the SMAGs phylogenomic 
tree inferred by Delmont et al. as reference. We used the oceanic signal of 
Delmontetal.asa proxy tothe contemporary distribution of the SMAGs 
in each ocean and sea (Fig. 5 and Supplementary Information, section 6). 
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Comparison of DNA, macrofossil and pollen 

To allow comparison between records in DNA, macrofossil and pollen, 
the taxonomy was harmonized following the Pan Arctic Flora check- 
list and NCBI. For example, since Bennike (1990)"8, Potamogeton 
has been split into Potamogeton and Stuckenia, Polygonym has been 
split to Polygonum and Bistorta, and Saxifraga was split to Saxifraga 
and Micranthes, whereas others have been merged, such as Melan- 
drium with Silene“. Plant families have changed names—for instance, 
Gramineae is now called Poaceae and Scrophulariaceae has been 
re-circumscribed to exclude Plantaginaceae and Orobancheae®”. 
We then classified the taxa into the following: category 1 all identical 
genus recorded by DNA and macrofossils or pollen, category 2 gen- 
era recorded by DNA also found by macrofossils or pollen including 
genus contained within family level classifications, category 3 taxa 
only recorded by DNA, category 4 taxa only recorded by macrofossils 
or pollen (Source Data 1). 


Phylogenetic placement 

We sought to phylogenetically place the set of ancient taxa with 
the most abundant number of reads assigned, and with a sufficient 
number of reference sequences to build a phylogeny. These taxa 
include reads mapped to the chloroplast genomes of the flora gen- 
era Salix, Populus and Betula, and to the mitochondrial genomes 
of the fauna families Elephantidae, Cricetidae, Leporidae, as well 
as the subfamilies Capreolinae and Anserinae. Although the evolu- 
tion of the chloroplast genome is somewhat less stable than that of 
the plant mitochondrial genome, it has a faster rate of evolution, 
and is non-recombining, and hence is more likely to contain more 
informative sites for our analysis than the plant mitochondria®™. Like 
the mitochondrial genome, the chloroplast genome also has a high 
copy number, so that we would expect a high number of sedimentary 
reads mapping to it. 

For each of these taxa, we downloaded a representative set of either 
whole chloroplast or whole mitochondrial genome fasta sequences 
from NCBI Genbank®, including a single representative sequence from 
arecently diverged outgroup. For the Betula genus, we also included 
three chloroplast genomes from the PhyloNorway database***?. We 
changed all ambiguous bases in the fasta files to N. We used MAFFT™* 
to align each of these sets of reference sequences, and inspected mul- 
tiple sequence alignments in NCBI MSAViewer to confirm quality®. We 
trimmed mitochondrial alignments with insufficient quality due to 
highly variable control regions for Leporidae, Cricetidae and Anserinae 
by removing the d-loop in MegaX®. 

The BEAST suite” was used with default parameters to create ultra- 
metric phylogenetic trees for each of the five sets of taxa from the mul- 
tiple sequence alignments (MSAs) of reference sequences, which were 
converted from Nexus to Newick format in Figtree (https://github.com/ 
rambaut/figtree). We then passed the multiple sequence alignments 
to the Python module AlignIO from BioPython® to create a reference 
consensus fasta sequence for each set of taxa. Furthermore, we used 
SNPSites* to create a vcf file from each of the MSAs. Since SNPSites 
outputs a slightly different format for missing data than needed for 
downstream analysis, we used a custom R script to modify the vcf for- 
mat appropriately. We also filtered out non-biallelic SNPs. 

From the damage filtered ngsLCA output, we extracted all readIDs 
uniquely classified to reference sequences within these respective 
taxa or assigned to any common ancestor inside the taxonomic group 
and converted these back to fastq files using seqtk (https://github. 
com/lh3/seqtk). We merged reads from all sites and layers to create a 
single read set for each respective taxon. Next, since these extracted 
reads were mapped against a reference database including multiple 
sequences from each taxon, the output files were not on the same coor- 
dinate system. To circumvent this issue and avoid mapping bias, we 
re-mapped each read set to the consensus sequence generated above 


for that taxon using bwa® with ancient DNA parameters (bwa aln -n 
0.001). We converted these reads to bam files, removed unmapped 
reads, and filtered for mapping quality > 25 using samtools”. This 
produced 103,042, 39,306, 91,272, 182 and 129 reads for Salix, Populus, 
Betula, Elephantidae and Capreolinae, respectively. 

We next used pathPhynder™, a phylogenetic placement algorithm 
that identifies informative markers on a phylogeny from a refer- 
ence panel, evaluates SNPs in the ancient sample overlapping these 
markers, and traverses the tree to place the ancient sample accord- 
ing to its derived and ancestral SNPs on each branch. We used the 
transversions-only filter to avoid errors due to deamination, except for 
Betula, Salix and Populus in which we used no filter due to sufficiently 
high coverage. Last, we investigated the pathPhynder output in each 
taxon set to determine the phylogenetic placement of our ancient 
samples (see Supplementary Information for discussion on phyloge- 
netic placement). 

Based on the analysis described above we further investigated the 
phylogenetic placement within the genus Mammut, or mastodons. To 
avoid mapping reference biases in the downstream results, we first built 
a consensus sequence from all comparative mitochondrial genomes 
used in said analysis and mapped the reads identified in ngsLCA as 
Elephantidae to the consensus sequence. Consensus sequences were 
constructed by first aligning all sequences of interest using MAFFT** 
and taking a majority rule consensus base in Geneious v2020.0.5 
(https://www.geneious.com). We performed three analyses for phy- 
logenetic placement of our sequence: (1) Comparison against a single 
representative from each Elephantidae species including the sea cow 
(Dugong dugon) as outgroup, (2) Comparison against a single repre- 
sentative from each Elephantidae species, and (3) Comparison against 
all published mastodon mitochondrial genomes including the Asian 
elephant as outgroup. 

For each of these analyses we first built a new reference tree using 
BEAST v1.10.4 (ref. *”) and repeated the previously described path- 
Phynder steps, with the exception that the pathPhynder tree path analy- 
sis for the Mammut SNPs was based on transitions and transversions, 
not restricting to only transversions due to low coverage. 


Mammut americanum. We confirmed the phylogenetic placement of 
our sequence using a selection of Elephantidae mitochondrial refer- 
ence sequences, GTR+G, strict clock, a birth-death substitution model, 
and ran the MCMC chain for 20,000,000 runs, sampling every 20,000 
steps. Convergence was assessed using Tracer” v1.7.2 and an effective 
sample size (ESS) > 200. To determine the approximate age of our recov- 
ered mastodon mitogenome we performed a molecular dating analysis 
with BEAST” v1.10.4. We used two separate approaches when dating 
our mastodon mitogenome, as demonstrated ina recent publication”. 
First, we determined the age of our sequence by comparing against a 
dataset of radiocarbon-dated specimens (n = 13) only. Secondly, we 
estimated the age of our sequence including both molecularly (n = 22) 
and radiocarbon-dated (n = 13) specimens using the molecular dates 
previously determined”. We utilized the same BEAST parameters as 
Karpinski et al.” and set the age of our sample witha gamma distribution 
(5% quantile: 8.72 x 10*, Median: 1.178 x 10°; 95% quantile: 5.093 x 10°; 
initial value: 74,900; shape: 1; scale: 1,700,000). In short, we specified 
a substitution model of GTR+G4, a strict clock, constant population 
size, and ran the Markov Chain Monte Carlo chain for 50,000,000 
runs, sampling every 50,000 steps. Convergence of the run was again 
determined using Tracer. 


Molecular dating methods 

Inthis section, we describe molecular dating of the ancient birch (Betula) 
chloroplast genome using BEAST v1.10.4 (ref. *’). In principle, the genera 
Betula, Populus and Salix had both sufficiently high chloroplast genome 
coverage (with mean depth 24.16x, 57.06 and 27.04, respectively, 
although this coverage is highly uneven across the chloroplast genome) 


and enough reference sequences to attempt molecular dating onthese 
samples. Notably, this is one of the reasons we included a recently 
diverged outgroup with a divergence time estimate in each of these 
phylogenetic trees. However, our Populus sample clearly contained a 
mixture of different species, as seen from its inconsistent placement 
inthe pathPhynder output. In particular, there were multiple support- 
ing SNPs to both Populus balsamifera and Populus trichocarpa, and 
both supporting and conflicting SNPs on branches above. Further- 
more, upon inspection, our Salixsample contained a surprisingly high 
number of private SNPs which is inconsistent with any ancient or even 
modernage, especially considering the number of SNPs assigned to the 
edges of the phylogenetic tree leading to other Salix sequences. We are 
unsure what causes this inconsistency but hypothesize that our Salix 
sample is also a mixed sample, containing multiple Salix species that 
diverged from the same placement branch on the phylogenetic tree 
at different time periods. This is supported by looking at all the reads 
that cover these private SNP sites, which generally appear to be from 
a mixed sample, with reads containing both alternate and reference 
alleles present at a high proportion in many cases. Alternatively, or 
potentially jointly in parallel, this could be a consequence of the high 
number of nuclear plastid DNA sequences (NUPTs) in Salix”. Because 
of this, we continued with only Betula. 

First, we downloaded 27 complete reference Betula chloroplast 
genome sequences and a single Alnus chloroplast genome sequence 
to use as an outgroup from the NCBI Genbank repository, and sup- 
plemented this with three Betula chloroplast sequences from the 
PhyloNorway database generated in a recent study”, for a total of 31 
reference sequences. Since chloroplast sequences are circular, down- 
loaded sequences may not always be in the same orientation or at the 
same starting point as is necessary for alignment, so we used custom 
code (https://github.com/miwipe/KapCopenhagen) that uses an 
anchor string to rotate the reference sequences to the same orienta- 
tion and start them all from the same point. We created a MSA of these 
transformed reference sequences with Mafft* and checked the quality 
of our alignment by eye in Seqotron™ and NCBI MsaViewer. Next, we 
called a consensus sequence from this MSA using the BioAlign consen- 
sus function®’ in Python, which is a majority rule consensus caller. We 
will use this consensus sequence to map the ancient Betula reads to, 
both to avoid reference bias and to get the ancient Betula sample on 
the same coordinates as the reference MSA. 

From the last common ancestor output in metaDMG*, we extracted 
read sets for all units, sites and levels that were uniquely classified to the 
taxonomic level of Betula or lower, with at a minimum sequence similar- 
ity of 90% or higher to any Betula sequence, using Seqtk”. We mapped 
these read sets against the consensus Betula chloroplast genome using 
BWA® with ancient DNA parameters (-o 2 -n 0.001 -t 20), then removed 
unmapped reads, quality filtered for read quality 225, and sorted the 
resulting bam files using samtools®. For the purpose of molecular dat- 
ing, itis appropriate to consider these read sets as asingle sample, and 
so we merged the resulting bam files into one sample using samtools. 
We used bcftools®’ to make an mpileup and calla vcf file, using options 
for haploidy and disabling the default calling algorithm, which can 
slightly biases the calls towards the reference sequence, in favour of 
a majority call on bases that passed the default base quality cut-off of 
13. We included the default option using base alignment qualities”®, 
which we found greatly reduced the read depths of some bases and 
removed spurious SNPs around indel regions. Lastly, we filtered the 
vcf file to include only single nucleotide variants, because we do not 
believe other variants such as insertions or deletions in an ancient 
environmental sample of this type to be of sufficiently high confidence 
to include in molecular dating. 

We downloaded the gff3 annotation file for the longest Betula ref- 
erence sequence, MG386368.1, from NCBI. Using custom R code”, 
we parsed this file and the associated fasta to label individual sites as 
protein-coding regions (in which we labelled the base with its position in 


the codon according to the phase and strand noted in the gff3 file), RNA, 
or neither coding nor RNA. We extracted the coding regions and checked 
inSeqotron™ and R that they translated toa protein alignment well (for 
example, no premature stop codons), both in the reference sequence and 
the associated positions in the ancient sequence. Though the modern 
reference sequence’s coding regions translated to a high-quality protein 
alignment, translating the associated positions in the ancient sequence 
with no depth cut-off leads to premature stop codons and an overall poor 
quality protein alignment. On the other hand, when using a depth cut-off 
of 20 and replacing sites in the ancient sequence which did not meet this 
filter with N, we see a high-quality protein alignment (except for the N 
sites). We also interrogated any positions in the ancient sequence which 
differed from the consensus, and found that any suspicious regions (for 
example, with multiple SNPs clustered closely together spatially in the 
genome) were removed with a depth cut-off of 20. Because of this, we 
moved forward only with sites in both the ancient and modern samples 
which met a depth cut-off of at least 20 in the ancient sample, which 
consisted of about 30% of the total sites. 

Next, we parsed this annotation through the multiple sequence align- 
ment to create partitions for BEAST”. After checking how many poly- 
morphic and total sites were in each, we decided to use four partitions: 
(1) sites belonging to protein-coding positions 1 and 2, (2) coding posi- 
tion 3, (3) RNA, or (4) non-coding and non-RNA. To ensure that these were 
high confidence sites, each partition also only included those positions 
which had at least depth 20 in the ancient sequence and had less than 
3 total gaps in the multiple sequence alignment. This gave partitions 
which had 11,668, 5,828, 2,690 and 29,538 sites, respectively. We used 
these four partitions to run BEAST” v1.10.4, with unlinked substitution 
models for each partition and a strict clock, with a different relative 
rate for each partition. (There was insufficient information in these 
data to infer between-lineage rate variation from a single calibration). 
We assigned an age of 0 to all of the reference sequences, and used a 
normal distribution prior with mean 61.1 Myr and standard deviation 
1.633 Myr for the root height*’; standard deviation was obtained by 
conservatively converting the 95% HPD toz-scores. For the overall tree 
prior, we selected the coalescent model. The age of the ancient sequence 
was estimated following the overall procedures of Shapiro et al. (2011)°8. 
To assess sensitivity to prior choice for this unknown date, we used two 
different priors, namely a gamma distribution metric towards a younger 
age (shape = 1, scale = 1.7); anda uniform prior on the range (0,10 Myr). 
We also compared two different models of rate variation among sites 
and substitution types within each partition, namely aGTR+G with four 
rate categories, and base frequencies estimated from the data, and the 
much simpler Jukes Cantor model, which assumed no variation between 
substitution types nor sites within each partition. All other priors were 
set at their defaults. Neither rate model nor prior choice had a qualita- 
tive effect on results (Extended Data Fig. 10). We also ran the coding 
regions alone, since they translated correctly and are therefore highly 
reliable sites and found that they gave the same median and a much 
larger confidence interval, as expected when using fewer sites (Extended 
Data Fig. 10). We ran each Markov chain Monte Carlo for a total of 100 
million iterations. After removing a burn-in of the first 10%, we verified 
convergence in Tracer” v1.7.2 (apparent stationarity of traces, and all 
parameters having an Effective Sample Size > 100). We also verified that 
the resulting MCC tree from TreeAnnotator” had placed the ancient 
sequence phylogenetically identically to pathPhynder® placement, 
which is shown in Extended Data Fig. 9. For our major results, we report 
the uniform ancient age prior, and the GTR+G, model applied to each of 
the four partitions. The associated XML is given in Source Data 3. The 
95% HPD was (2.0172,0.6786) for the age of the ancient Betula chloro- 
plast sequence, with a median estimate of 1.323 Myr, as shown in Fig. 2. 


Reporting summary 
Further information on research design is available in the Nature Port- 
folio Reporting Summary linked to this article. 
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Data availability 


Raw sequence data (13,135,646,556 reads following adapter trimming) are 
available through the ENA project accession PRJEB55522. Pollen counts 
are available through https://github.com/miwipe/KapCopenhagen.git. 
Source data are provided with this paper. 


Code availability 


All code used is available at https://github.com/miwipe/KapCopen- 
hagen.git. 
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Extended Data Fig. 1| Setting A. Type locality 50 indicating units in formation cleaning for ancient eDNA samples. E. Sampling in the permafrost within unit B3 
b. Overview locality 74a+b with a complete sediment sequence. C. Overview of at locality 50. F. Organic rich sediment at the base of mega-scale cross-bedding 
locality 69. D. Detail of organic rich sediment in unit B3 before excavation and within unit B2 at locality 74a+b. White circles mark persons for scale. 
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Extended Data Fig. 2 | Phylogenetic placement results of Leporidae 
mitochondrial reads, using transversion SNPs only. Reads have been 
merged fromall layers and sites. The green numbers on each edge represent 
the number of supporting (+) SNPs, whereas the red numbers indicate 
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conflicting (-) SNPs in the ancient Leporidae environmental mitochondrial 
genome overlapping the reference SNPs assigned to the respective edge. There 
isa clear placement for the ancient Leporidae environmental mitochondrial 
genome onthe edge marked +2, basal to the extant Lepus lineage. 
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Extended Data Fig. 3 | Phylogenetic placement results for representatives mitochondrial genome overlapping the reference SNPs assigned to the 
of the Capreolinae mitochondrial reads, using transversion SNPs only. respective edge. There is a clear placement for the ancient Capreolinae 
Reads have been merged fromall layers and sites. The green numbers oneach environmental mitochondrial genome on the edge marked +8/-3, basal to the 
edge represent the number of supporting (+) SNPs, whereas the red numbers Rangifer genus. 
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Extended Data Fig. 4 | Phylogenetic placement of Elephantidae 
mitochondrial reads within mastodons (Mammutamericanum), using 
Elephas maximus as outgroup, including transitions and transversion 
SNPs. (Please note that the NCBI taxonomy includes the Mammut genus 
within Elephantidae). The reference dataset consisted of mitochondria from 
mastodons (Mammut americanum) only and one Elephas maximus as an 
outgroup. Reads have been merged from all layers and sites. The green 
numbers on each edge represent the number of supporting (+) SNPs, whereas 


the red numbers indicate conflicting (-) SNPs in the ancient Elephantidae 
environmental mitochondrial genome overlapping the reference SNPs 
assigned to the respective edge. There is a placement for the ancient 
Elephantidae environmental mitochondrial genome on the edge marked +2/-1, 
identifying it as basal to the mastodon (Mammut americanum) clade, which 
contains most of all mastodon reference mitochondrial genomes. Please note 
that this placement is based on two transition SNPs witha read depth of three 
reads per SNP. 


== KR821135.1 Cygnus cygnus 
~3 NC 027095.1 Cygnus cygnus 
KP981363.1 Cygnus cygnus 
T KF800698.1 Cygnus columbianus jankowskii 
-10 NC 017604.1 Cygnus columbianus bewickii 
JQ282800.1 Cygnus columbianus bewickii 


NC 007691.1 Cygnus columbianus Cygnus 
annir AAA =I DQ083161.1 Cygnus columbianus 
+1 NC 012843.1 Cygnus atratus 
ETA FJ379295.1 Cygnus atratus 


NC 027096.1 Cygnus olor 
-4 KP981364.1 Cygnus olor 


KR867679.1 Cygnus olor 
CM020139.1 Cygnus olor 
KJ680301.1 Branta bernicla 
NC 007011.1 Branta canadensis | Branta 
DQ019124.1 Branta canadensis 
NC 016922.1 Anser fabalis 
HQ890328.1 Anser fabalis 
KP238480.1 Anser cygnoides 
MK102803.1 Anser cygnoides 
NC 025654.1 Anser indicus 
<7 KM455570.1 Anser indicus 
a MK133021.1 Anser anser 
-1 NC 023832.1 Anser cygnoides 
I TI KJ124555.1 Anser cygnoides 
“T 


-4 
+3 5S 
-9 


NC 011196.1 Anser anser 
+1 EU932689.1 Anser anser 
Ea KP943133.1 Anser cygnoides 
MK133022.1 Anser cygnoides Anser 
KU211647.1 Anser cygnoides 
KJ794189.1 Anser cygnoides 
KT427463.1 Anser cygnoides 
True branch length KY767671.1 Anser cygnoides 
KP881611.1 Anser cygnoides 
—— _ Uniquely supporting SNPs KP026178.1 Anser cygnoides 
KJ794188.1 Anser cygnoides 
KJ778677.1 Anser cygnoides 
MN122908.1 Anser anser 
-1 NC 039888.1 Anser albifrons frontalis 
MH000287.1 Anser albifrons frontalis 
“2, NC 004539.1 Anser albifrons 
AF363031.1 Anser albifrons 


bl 


== Uniquely conflicting SNPs 


Extended Data Fig. 5| Phylogenetic placement of mitochondrial reads Anatidae environmental mitochondrial genome overlapping the reference 
assigned within Anatidae and placed with representatives ofthe Anatidae, SNPs assigned tothe respective edge. Thereis a clear placement for the ancient 
using transversion SNPs only. Reads have been merged fromall layers and Anatidae environmental mitochondrial genome on the edge marked +3, basal 
sites. The green numbers on each edge represent the number of supporting (+) to the Branta genus. 

SNPs, whereas the red numbers indicate conflicting (-) SNPs inthe ancient 
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Extended Data Fig. 6 | Phylogenetic placement results of Cricetidae 
mitochondrial reads, using transversion SNPs only. Reads have been 
merged from all layers and sites. The green numbers on each edge represent 
the number of supporting (+) SNPs, whereas the red numbers in the red circles 
indicate conflicting (-) SNPs in the ancient Cricetidae environmental 
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Cricetinae © 


MT483992.1 Ellobius lutescens 
MT483991.1 Ellobius fuscocapillus 
MT118739.1 Prometheomys schaposchnikowi 
+ MN792943.1 Dicrostonyx torquatus 
MN792985.1 Dicrostonyx groenlandicus 
KX683880.1 Dicrostonyx hudsonius 
MK482363.1 Myodes rutilus 
KF918859.1 Myodes glareolus 
KT725595.1 Myodes rufocanus 
NC 016427.1 Myodes regulus 
JN629046.1 Eothenomys regulus 
KU200225.1 Eothenomys inez 
KX014874.1 Eothenomys miletus 
KP997311.1 Eothenomys melanogaster 
+ FJ483847.1 Eothenomys chinensis 
NC 013563.1 Proedromys liangshanensis 
FJ463038.1 Proedromys liangshanensis 
e MT614217.1 Lasiopodomys mandarinus 
MN614478.1 Lasiopodomys brandtii 
MN199170.1 Lasiopodomys gregalis 
+ NC 003041.1 Microtus kikuchii 
MW659922.1 Microtus montebelli 
NC 015243.1 Microtus fortis calamorum 
JF261175.1 Microtus fortis calamorum 
e MK805519.1 Microtus fortis pelliceus 
NC 015241.1 Microtus fortis fortis 
JF261174.1 Microtus fortis fortis 
NC 016055.1 Neodon irene 
MG833880.1 Neodon fuscus 
KU891252.1 Neodon sikimensis 
NC 027945.1 Microtus ochrogaster 
NC 049220.1 Microtus richardsoni 
+ MN058078.1 Microtus chrotorrhinus 
NC 041250.1 Microtus agrestis 
e MN058077.1 Microtus cabrerae 
MN326850.1 Terricola subterraneus 
+ MN058079.1 Microtus thomasi 
MG948434.1 Microtus arvalis 
NC 008064.1 Microtus levis 
DQ015676.1 Microtus rossiaemeridionalis 


ş 


Arvicolinae 


m+ 


+ MG182016.1 Neotoma magister 
e KY707300.1 Neotoma mexicana 
KU745736.1 Neotoma fuscipes 
KY707312.1 Isthmomys pirrensis 
KY707307.1 Reithrodontomys mexicanus 
KU168563.1 Onychomys leucogaster 
KY707308.1 Peromyscus crinitus 
KY707305.1 Peromyscus megalops 
NC 035611.1 Peromyscus melanophrys 
KY707303.1 Peromyscus mexicanus ` 
KY707310.1 Neotomodon alstoni Neotominae 
+ KY707302.1 Podomys floridanus 
KY707306.1 Peromyscus aztecus 
KY707304.1 Habromys ixtlani 
MT078819.1 Peromyscus eremicus 
KY707309.1 Peromyscus pectoralis 
KY707299.1 Peromyscus attwateri 
MH260579.1 Peromyscus maniculatus bairdii 
KY707301.1 Peromyscus polionotus 
BK010700.1 Peromyscus leucopus 


mitochondrial genome overlapping the reference SNPs assigned to the 
respective edge. There is a placement for the ancient Cricetidae environmental 
mitochondrial genome on the edge marked +1, basal to the Arvicolinae 
subfamily. 


+16/-1660 


Extended Data Fig. 7 | Phylogenetic placement results for our Populus 
chloroplast reads, using both transition and transversion SNPs, and using 
reads merged fromall layers and sites. The numbers on each edge represent 
the number of supporting (+) and conflicting (-) SNPs in the ancient Populus 
environmental genome overlapping the reference SNPs assigned to that edge. 
The ancient Populus environmental genome clearly contains a mixture of 
different species. The most likely placement is on the edge above Populus 
trichocarpa (NC 009143.1) and Populus balsamifera (NC 024735.1), with +71/-15 
supporting and conflicting SNPs. However, we find some support for both 
branches directly leading to these species as well. Populus balsamifera and 


+11/-228 


NC 031371.1 Populus ilicifolia 


42/-107 
e MT482538.1 Populus trinervis 


+13/-47 
+1/-133 


oe MT482535.1 Populus tremula 


+8/-63 
(@7482542.1 Populus rotundifolia 
+3/-77 
+1/-13 
(® MT407464.1 Populus davidiana 
+10/-115 
e NC 024735.1 Populus balsamifera 
+71/-15 


@ NC 009143.1 Populus trichocarpa 


+3/-57 


- MN864049.1 Populus koreana 
+4/-26 


+3/-53 
@  MT482541.1 Populus wilsonii 


MG262367.1 Salix rehderiana 


P. trichocarpa are considered sister species. They are both distributed in North 
America, as far North as Alaska, are known to hybridise both among themselves 
and other Populus species and are morphologically very similar”>””°. Previous 
analyses found a very recent nuclear genome divergence time of only 75000 
years ago for Populus trichocarpa and P. balsamifera™?, but a deep chloroplast 
genome divergence time of at least 6-7 Ma”, whichis an uncommon pattern in 
plants. Our ancient Populus sample could contain individuals either ancestral 
to, or hybridized from, the modern Populus trichocarpaand P. balsamifera 
species. 


Article 


Extended Data Fig. 8 | Phylogenetic placement results for our Salix 
chloroplast reads, using both transition and transversion SNPs, and using 
reads merged from all layers and sites. The numbers on each edge represent 
the number of supporting (+) and conflicting (-) SNPs in the ancient Salix 
environmental genome overlapping the reference SNPs assigned to that edge. 
The ancient Salix environmental genome falls basal to a main Salix clade. Our 
ancient Salix sample is phylogenetically placed, with 356 supporting SNPs and 
22 conflicting SNPs, ona basal branch leading to the main clade. Although the 
Salix chloroplast phylogeny is not considered fully resolved”, the difficulties 


MT482535.1 Populus tremula 
— MT551160.1 Salix dasyclados 
+1/-47 |MG262368.1 Salix rorida 


NC 037428.1 Salix rorida 


+356/-22 


@NC 037425.1 Salix minjiangensis 
@ 
® NC 037423.1 Salix hypoleuca 


@MT551159.1 Salix argyracea 
$1/-4 , , 
®MT551163.1 Salix suchowensis 
+ 
(®MT551161.1 Salix eriocephala 


® MG262369.1 Salix taoensis 
+2/-14 ® 
NC 037429.1 Salix taoensis 
e 
NC 037427.1 Salix rehderiana 
+3/-56 
. MG262367.1 Salix rehderiana 
+1/-18 
@ 7551 162.1 Salix integra 
+1/-34 
@® NC 037424.1 Salix magnifica 


+1/-422 


— NC 037422.1 Salix chaenomeloides 
+2/-73 
8 NC 037426.1 Salix paraplesia 


in resolution lie underneath our placement branch, and this along with the 
high number of SNPs on the placement branch allow us to be confident inthe 
placement position. Our chloroplast phylogeny agrees roughly with'™, which 
estimated a divergence date between these two main Salix clades at 16.9 Ma, 
and arootage of the first clade, to which our ancient sample is basal to, of 

8.1 Ma. It is reasonable, then, to conclude that our ancient Salixsample is at 
least 8.1 Ma diverged from modern Salix species, and probably represents an 
extinct species, or extant species without a reference genome sequenced, ora 
pool thereof. 


Extended Data Fig. 9 | Phylogenetic placement results for our Betula 

chloroplast reads, using both transition and transversion SNPs, and using 
reads merged from all layers and sites. Our ancient Betula sample was placed 
basal toa main Betula clade, based on 29 supporting (green) and 13 conflicting 


MG356709.1 Alnus rubra 
MK888853.1 Betula ainoides 
@ NC 039992.1 Betula lenta 
MG386369.1 Betula populifolia 
+3/-46 @,. 039995.1 Betula populifolia 
MG386401.1 Betula cordifolia 
zi NC 037473.1 Betula cordifolia 
LC542973.1 Betula chichibuensis 
NC 047177.1 Betula chichibuensis 
® MN830400.1 Betula costata 
@@ CC216990 Betula nana nana 
MG386370.1 Betula pubescens 
o 039996.1 Betula pubescens 


29/-13 
KX703002.1 Betula nana 


NC 033978.1 Betula nana 
@MT872529.1 Betula nana 
MH2057385.1 Betula platyphylla 
® NC 039994.1 Betula platyphylla 
43 He MG386368.1 Betula platyphylla 
© 9M7872528.1 Betula nana 
NC 039993.1 Betula occidentalis 
MG386367.1 Betula occidentalis 
1oMG674393.1 Betula halophila 
+ MT310900.1 Betula microphylla 
°9MT872524.1 Betula nana 
CDT216991 Betula fruticosa 
t MT872525.1 Betula nana 
® CCC216990ssp Betula nana 
“@ MT872530.1 Betula nana 
MT872527.1 Betula nana 
MT872526.1 Betula nana 


(red) SNPs onits placement branch, and with very low numbers of supporting 
SNPs elsewhere in the tree other than those leading to this branch. This 
placement agrees with the BEAST molecular dating analysis (see Molecular 
Dating Methods). 
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Ancient Betula chloroplast age distribution under different BEAST models 


1.254 
1.00 4 
0.754 Model 
> All sites, flat prior, GTR+I°4 
2 All sites, flat prior, Jukes—Cantor 
D 
Q All sites, gamma prior, GTR+T4 
0.504 ; . 
Coding only, flat prior, GTR+I°4 
0.254 
0.00 4 
T T T T T 
0 1 2 3 4 
Age (millions of years) 


and therefore fewer total sites, gives a larger confidence interval as expected 


Extended Data Fig 10 | Molecular age distribution. Results of running the 
Results reported in the text are for the red curve, witha flat prior andaGTR+I, 


ancient Betula chloroplast molecular dating analysis BEAST v1.10.4 (ref. *”) with 
different priors and nucleotide substitution models. Using onlycodingregions, substitution model. 
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Study description We shotgun sequenced ancient environmental DNA from sediments found at the geological formation Kap København in Greenland, 
for paleo-environmental recontruction. 


Research sample 41 samples obtained from bulk samples or directly in the profiles were used for DNA analysis. 8 1kg bulk samples were obtained for 
Cosmogenic nuclide burial dating. In addition, different types of minerals were used to test DNA adsorption and release. Sixty-nine 
samples were collected for determination of the polarity. All samples were taken during three field trips, and spanning 5 different 
localities within the same formation. 


Sampling strategy Samples were taken across the three units and from 5 different sites, within each site biological replicates were taken in the units 
both horizontally and vertically see DNA metadata. 


Data collection DNA processing was performed at Centre for GeoGenetics and sequenced at the Danish National Sequencing Centre on Illumina 
platforms ( HiSeq 4000, NovaSeq6000). 


Timing and spatial scale | DNA Data were collected from sediment samples from Kap København formation, the northern most Greenland which has been date 
to 2.0 Mya. 


Data exclusions The DNA results only includes samples that yielded sequenceable DNA. Some samples did not. 
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Reproducibility The strongest evidence for reproducibility is that this study includes replicates of geological layers from the same unit but at different 
locations within the formation (sites) and the fact that they yield highly identical taxonomic profiles. Further, we had biological 
replicates within the same site and unit, as well as technical replicates of individual samples. All yielding near to identical results. 


Randomization Randomization is not relevant. 


Blinding Blinding is not relevant, as there is no presupposed hypothesis. 
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Field work, collection and transport 


Field conditions Field works were performed by three different expedition groups. Details are supplied in Methods and SI. 


Location Kap København Formation in North Greenland (82° 24' 00" N 22° 12' 00" W) 
Access & import/export Sediment samples were collected and exported by different research groups from different countries, in agreement with the rules of 
the specific countries. All sediment samples were imported to Denmark as geological sediment samples for research, for which there 


is no specific permit required by the authorities. 


Disturbance The samples concerns small sediment samples, and didn't cause disturbance to the surrounding environment as a whole. 
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